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-An efficient random mutagenesis procedure coupled to a rep- 
lica pte te screen facilitated the isolation of mutant subtilisins 
from Bacillus amyloliquefaciens that had altered autolytic 
^ability under alkaline conditions. Out of about 4000 clones 
screened, approximately 70 produced subtilisins with reduced 
^ability (negatives). Two clones produced a more stable 
Subtilisin (positives) and were identified as having a single 
mutation, either HelOTVal or Lys213Arg (the wild-type amino 
acid is followed by the codon position and the mutant amino 
acid). One of the negative mutants, MetSOVal, was at a site 
>vhere other homologous subtilisins contained a Phe. When 
the MetSOPhe mutation was introduced into the B. amylo- 
liquefaciens gene, the mutant subtilisin was more alkaline 
stable. The double mutant (Del07Val/Lys213Arg) was more 
stable than the isolated single mutant parents. The triple 
mutant (Met50Phe/Hel07Val/Lys213Arg) was even more 
stable than Bel07Val/Lys213Arg (up to two times the autolytic 
half-time of wild-type at pH 12). These studies demonstrate 
the feasibility for improving the alkaline stability of proteins 
by random mutagenesis and identifying potential sites where 
substitutions from homologous proteins can improve alkaline 
stability, 
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Introduction 

The utility of industrial enzymes often depends on their stability 
in the presence of oxidants, organic solvents, extremes of pH 
and/or temperature. Subtilisin is a serine endopeptidase (for re- 
view see Markland and Smith, 1971) that is used industrially often 
in the presence of oxidants dr under extremely alkaline conditions. 
Replacement of an oxidatively sensitive methionine residue by 
cassette mutagenesis (Wells et al. , 1985) of the cloned subtilisin 
gene from Bacillus amyloliquefaciens (Wells et aL, 1983) pro- 
duced mutant subtilisins that are resistant to oxidative inactiva- 
tion by hydrogen peroxide (Estell et al:, 1985). However, under 
prolonged incubation at high pH, subtilisin undergoes autolytic 
inactivation which may be related to the conformational stabili- 
ty of the molecule (Ottesen and Svendsen, 1970). 

Protein conformational stability is determined by a complicated 
mixture of covalent and non-covalent binding forces. Our rudi- 
mentary understanding of these stabilizing interactions, and the 
typically small stabilization energy (5-15 kcal/mol) that dis- 
tinguishes the folded and unfolded state (s) of proteins (for review 
see Pfeil, 1981; Creighton, 1983), currently pose a difficult 
challenge to the rational design of protein stability. While it has 
been possible to increase the thermal stability of T4 lysozyme 
to irreversible inactivation (Perry and Wetzel, 1984) and dihydro- 
folate reductase to reversible denaturation (Villafranca et al. t 
1983) by introduction of disulfide bonds, this strategy has had 
mixed success in improving the thermal stability of subtilisin 



toward autolytic inactivation (Wells and Powers, 1986; Pantoliano 
et al. , 1987). Another rational approach to protein stability has 
been to introduce mutations that increase stability of secondary 
structural units such as a-helices (Hecht et al , 1986; Mitchinson 
and Baldwin, 1986; Matthews et al, 1987). 

In contrast to the above site-specific approaches the most widely 
used strategy is to couple random mutagenesis with a suitable 
screen or selection to isolate point mutants that are more ther- 
mally stable. This strategy has been used to increase the thermal 
stability of the arnmo-terrninal domain of phage lambda repressor 
(Hecht et aL, 1984), staphylococcal nuclease (Shortle and Lin, 
1985; Shortle and Meeker, 1986; Shortle, 1986), T4 lysozyme 
(Alber and Wozniak, 1985) and kanamycin nucleotidyl 
transferase (Matsumura and Aiba, 1985; Liao et aL, 1986). In 
some cases these point mutants have cumulative effects on pro- 
tein stability (Shortle, 1986; Shortle and Meeker, 1986; Mat- 
sumura et aL, 1986; Liao et aL, 1986; Hecht et aL, 1986). . 

Here we describe an improved random mutagenesis and screen- 
ing procedure that facilitated the isolation of mutant subtilisins 
which are more autolytically stable (in a cumulative fashion) 
under alkaline conditions. Thus, the stability of this protein to 
extreme alkaline conditions is not optimal and can be increased 
by random mutagenesis. 

Materials and methods 

Reagents 

Resolved Sp-diastereoisomers of deoxyguanosine 5'-0-(l-thiotri- 
phosphate) (dGTPaS) and dATPas and racemic mixtures of 
dTTPaS and dCTPaS were provided by Dr Phil Buzby (New 
England Nuclear) . ATP and deoxynucleotide triphosphates were 
from PL Biochemicals. Dam methylase, S-adenosylmethionine, 
T4 DNA kinase, T4 DNA ligase and ail restriction endonucleases 
were from New England Biolabs. DNA polymerase I large frag- 
ment (Klenow) and AMY polymerase were from Boehringer- 
Mannheim and Life Sciences Corp., respectively. Oligodeoxy- 
ribonucleotides were provided by the Organic Chemistry Group 
at Genentech. 

Construction of random mutagenesis library 
A convenient E. coli—B. subtilis shuttle vector, pBO180 (Figure 
1) was constructed to contain the 2.3 kb EcoRI— PvuU fragment 
comprising the E. coli 322 origin and ampicillin resistance gene 
(amp 1 ) from pBR327 (Covarrubias et aL, 1981), the 3.7 kb 
EcoRI—BamHl fragment containing the UB 110 origin, the chlor- 
amphenicol (cmp 1 ) and neomycin (neo 1 ) resistance genes from 
pBD64 (Gryczan et aL, 1980) and the 1.5 kb EcoKL-BamHI 
subtilisin gene fragment (Wells etaL, 1983). In addition, a unique 
and silent Kpnl site at codon 166 was introduced into the sub- 
tilisin gene by site-directed mutagenesis (Zoller and Smith, 1982) 
to facilitate sub-cloning of mutant subtilisins. DNA fragments 
to be ligated were produced by restriction digestion of plasmid 
DNA (0.2 to 1 /tg) and purified from 0.8% low gel temperature 
agarose (BioRad) in TAE buffer (Maniatis et aL, 1982) under 
sterile conditions. Gel slices containing relevant DNA fragments 
were melted by heating to 68°C for 5 min in 3.5 volumes of 
H 2 0. DNA fragments were ligated in the agarose solution by 
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incubation with T4 DNA ligase (100 units/ml final), iigase buffer 
(New England Biolabs) at 24°C for 1 — 16 h. Agarose gel ligation 
mixtures were added to two volumes of freshly competent E. 
coli LE392 cells (Enquist and Weisberg, 1977) for transformation 
according to Mandel and Higa (1970). 

Deoxyuritoe-containing template DNA from M13mpl 1 SUBT 
(Wells and Powers, 1986) was prepared according to Kunkel 
(1985), except a more stable E. coli strain (B0265) was used 
to produce DNA. B0265 was constructed by mating E. coli 
RZ1032 (tef; Kunkel, 1985) with K5303 (cmp r , provided by Dr 
Harvey Miller, Genentech) which contains a chloramphenicol 
resistance gene on its F' episome. A conjugate host (tet r , cmp 1 ) 
was selected on LB plates containing 12.5 fig/ml chloramphenicol 
and 20 ^g/ml tetracycline. Template DNA was purified by CsCl 
density gradients (Maniatis et aL, 1982). 

Agrimer (Aval') having the sequence 5 ' GAAAAAAG A- 
CQCXAQCGTCGCTT A3 ' ending at codon -11 within the 
subtilisin gene, was used to alter a unique Aval recognition 
sequence without changing the protein sequence. (The asterisk 
denotes the mismatch from the wild-type sequence and underlined 
is the altered Aval site.) The 5' phosphorylated Aval~ primer 
( -~ 320 pmol) was annealed to the deoxyuridine containing 
M13mpll SUBT template (-40 pmol) in 1.88 ml of 53 mM 
NaCl, 7.4 mM MgCl 2 and 7.4 mM Tris-HCl (pH 7.5) by 
heating to 90°C for 2 min and cooling to 24°C over 15 min 
(Figure 1). Primer extension was initiated at 24 °C by addition 
of 100 units of Klenow fragment plus 50 fiM of dATP, dTTP, 
dGTP and dCTP (final). Aliquots (50 fd) of the mixture were 
sampled every 15 s over 10 min and extension reactions were 
stopped by addition of 40 mM EDTA (final), pH 8.0. Pooled 
DNA samples were extracted with phenol/CHCl 3 , precipitated 
twice with ethanol (Maniatis et aL, 1982), and redissolved in 
0.4 ml 1 mM EDTA, 10 mM Tris-HCl, pH 8. 

Separate misincorporation reactions for addition of each of the 
four a-thiodeoxynucleotides onto the 3' ends of the randomly 
terminated template pool ( — 20 ^g) were carried out by reaction 
with 0.25 mM of a given dNTPas, 100 units AMV polymerase, 
50 mM KQ, 10 mM MgCl 2 , 0.4 mM dithiothreitol and 50 mM 
Tris (pH 8.3) in 0.2 ml total volume at 37°C for 90 min (Cham- 
poux, 1984). Extension from the site of misincorporation was 
performed by reaction with 50 fiM all four dNTPs (pH 8.0), 50 
units AMV polymerase at 37°C for 5 min. After ethanol precipi- 
tation, closed circular heteroduplexes were synthesized by reac- 
tion for 2 days at 14 °C under the same conditions used for the 
timed extension reactions above except that the reactions con- 
tained 1000 units T4 DNA ligase, 0.5 mM ATP and 1 mM 
2-mercaptoethanol . 

Heteroduplex DNA in each reaction mixture was methylated 
by incubation with 80 ^M S-adenosylmetm^nine and 150 units 
dam methylase for 1 h at 37°C. After heating at 68°C for 15 min, 
half of each of the four methylated heteroduplex reactions was 
transformed into 2.5 ml competent E. coli JM101 (Messing, 
1979). Hie number of independent transformants from each of 
the four transformations ranged from 0.4 to 2.0 x 10 5 . After 
growing up phage pools, RF DNA from each of the four transfor- 
mations was isolated and purified by centrifugation through CsCl 
density gradients. Approximately. 2 fig of RF DNA from each 
of the four pools was digested with EcoBl, BamBI and Aval. 
The 1.5 kb EcoRI— BamBI fragment (i.e. Aval resistant) was 
purified on low gel temperature agarose and ligated into the 
5.5 kb EcoKL-BamHI vector fragment of pBO180. The total 
number of independent transformants from each a-thiodeoxy- 
nucleodde misincorporation plasmid library ranged from 1 .2 to 
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2.4 X 10 4 . Plasmids from each of the four transformation no 
were purified by centrifugation through CsCl density gradients 
Expression and screening of subtilisin point mutants 
Pooled plasmid DNA was transformed (Anagnostopoulos ann 
Spizizen, 1961) into BG2036, a strain of B, subtilis deficient 
extracellular protease genes (Yang et aL, 1984). For each trans 1 
formation, 5 fig of DNA produced approximately 2.5 x lfjs 
independent BG2036 transformants. Fresh transformants were 
arrayed onto 96-well microtiter plates containing 150 fd per well 
LB media plus 12.5 /ig/ml chloramphenicol. After 1 h at room 
temperature, a replica pattern was stamped (using a matched 96 
prong replica stamp) onto a 132 mm BA 85 nitrocellulose filter 
(Schleicher and Scheull) which was layered on a 140 mm 
diameter agar plate containing LB media, 5 fig/nH chlor- 
amphenicol and 1.6% skim milk (Wells et aL, 1983). Cells were 
grown for about 16 h at 30° C until halos of proteolysis were 
roughly 5—7 mm in diameter. Filters were transferred directly 
to freshly prepared agar plates at 37 °C containing 1.6% skim 
milk and 50 mM sodium phosphate pH 11.5, and incubated for 
3 — 6 h at 37 °C until wild-type subtilisin produced halos of about 
5 mm. The plates were stained for 10 min at 24 °C with 
Coomassie blue solution (0.25% Coomassie blue R-250 in 25% 
ethanol) and destained with 25% ethanol, 10% acetic acid for 
20 min. Zones of proteolysis appeared as blue halos on a white 
background on the underside of the plate and were compared 
with the original LB -skim milk growth plate that was similarly 
stained and destained as a control. Clones were considered 
'positive' that produced proportionately larger zones of proteo- 
lysis on the high pH plates relative to the original growth plate. 
* Negative' clones gave smaller halos under alkaline conditions. 
Positive and negative clones were restreaked to colony purify 
and screened again in triplicate to confirm alkaline pH results. 
Identification and analysis of mutant subtilisins 
Plasmid DNA from 5 ml overnight cultures of BG2036 clones 
expressing mutant subtilisins was prepared according to Birn- 
boim and Doly (1979) with minor modifications: cells were 
incubated with 2 mg/ml lysozyme for 5 min at 37°C to ensure 
cell lysis, and an additional extraction with phenol/CHCl 3 was 
employed to remove contaminants. The 1.5 kb EcoEI—BamBL 
fragment containing the subtilisin gene was ligated into M13mpl 1 
and template DNA was prepared for DNA sequencing (Messing 
and Vieira, 1982). Three DNA sequencing primers ending at 
codon —26, +95 and + 155 were synthesized to match the sub- 
tilisin coding sequence. Single track DNA sequencing was used 
for preliminary identification of mutants. For example, a G se- 
quence track was used to identify a mutant from the dGTPas 
library. Four track DNA sequencing was performed over the site 
of mutagenesis to identify the mutant sequence (Sanger et aL, 
1980). 

Confirmed positive and negative bacilli clones were cultured 
in LB media containing 12.5 fig/ml chloramphenicol. Subtilisin 
was purified from culture supernatants as previously described 
(Estell et aL, 1985). Enzymes were greater than 98% pure as 
analyzed by SDS— polyacrylamide gel electrophoresis (Laemmli, 
1970), and protein concentrations were calculated from the ab- 
sorbance at 280 nm (e^gf = 1.17,. Matsubara et aL - 1965). 
Enzyme activity was measured with 200 /-tg/ml succinyl-L-Ala- 
L-Ala-L-Pro-L-Phe-p-nitroanihde (Sigma) in. 0.10 M Tris pH 8.6 
or 0.10 M CAPS pH 10.8 at 25°C. Sp. act. (^mol product/ 
rnin/mg) was calculated from the change in absorbance at 410 nm 
from production of ^-nitroaniline (e 410 = 8480 M" 1 cm -1 ; Del 
Mar et aL, 1979). Alkaline autolytic stability studies were per- 
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Fig. 1. Strategy for producing point mutations in the subtilisin coding 
sequence by misincorporation of a-thiodeoxynucleotide triphosphates. A 
restriction primer designed to eliminate a unique Aval restriction site within 
the subtilisin gene was used to produce a set of randomly terminated primer 
template heteroduplexes (indicated by t = 1 min, 5 min and 10 min). or- 
Thiodeoxynucleotides were misincorporated (indicated by x) in four separate 
reactions. After heteroduplex synthesis, in vitro methylation and 
transformation, RF DNA pools from the four misincorporation reactions 
were isolated. Aval-resistant inserts were subcloned into pBO180 for 
expression and screening in B. subtilis BG2036 (see Materials and methods 
for details). 

formed on purified enzymes (200 Mg/ml) in 0.10 M potassium 
phosphate (pH 12.0) at 25°C. At various times aliquots were 
assayed for residual enzyme activity (Wells and Powers, 1986). 
Results 

Optimization and analysis of mutagenesis frequency 
A set of primer-template molecules that were randomly 3'-ter- 
miriated over the mature subtilisin coding sequence (Figure 1) 
was produced by stopping polymerase reactions with EDTA after 
various times of extension from a fixed primer. The extent and 
distribution of duplex formation over the subtilisin gene fragment 
(1 kb) was assessed by multiple restriction digestion (not shown). 



For example, production of new HinfL fragments identified when 
polymerase extension had proceeded past IlellO, Leu233 and 
Asp259 in the subtilisin gene. The efficiency of each misincor- 
poration reaction was estimated to be greater than 80% by the 
addition of each dNTPas to the Aval restriction primer, and by 
analysis from polyacrylamide gel electrophoresis (Hillebrand 
et aly 1984). [Although it is possible to produce misincorpora- 
tions with normal deoxynucleotides (Zakour et al, 1984; 
Champoux, 1984), the a-thiophosphate linkage is resistant to 
exonucleolytic cleavage by E. coli DNA polymerase I (Shortle 
et aL, 1982). Therefore, we reasoned (but have not shown) that 
a misincorporated a-thiophosphate deoxynucleotide would be 
more stably retained in vivo compared to a normal phosphate 
deoxynucleotide] . 

Several manipulations were employed to maximize the yield 
of the mutant sequences in the heteroduplex. These included the 
use of a deoxyuridine containing template (Kunkel, 1985), in vitro 
methylation of the mutagenic strand (Pukkila et al. } 1983; Hor- 
ton and Lord, 1986), and the use of Aval restriction -selection 
against the wild-type template strand which contained a unique 
Aval site. The separate contribution of each enrichment procedure 
to the final mutagenesis frequency was not determined, except 
that prior to Aval restriction - selection - 30 % of the segregated 
clones in each of the four pools still retained a wild-type Aval 
site within the subtilisin gene. After Aval restriction— selection 
>98% of the plasmids lacked the wild-type Aval site. Sub- 
cloning of the ^4val-resistant EcoRI—BamHl subtilisin gene frag- 
ment by agarose gel purification, and in situ ligation with a 
similarly cut pBO180 vector fragment allowed large numbers of 
recombinants to be obtained ( > 100 000 per fig equivalent of in- 
put M13 DNA). . 

The frequency of mutagenesis for each of the four dNTPas 
misincorporation reactions was estimated from the frequency that 
unique restriction sites were eliminated (Table I). Unique restric- 
tion sites chosen for this analysis within the subtilisin gene were 
CZoI, PvuU and Kpnl located at codons 35, 104 and 166, respect- 
ively. As a negative control, the mutagenesis frequency was deter- 
mined at the Pstl site located in the /3-lactamase gene which is 
outside the window of mutagenesis. Because the absolute muta- 
genesis frequency was close to the percentage of undigested plas- 
mid DNA remaining after a single round of restriction digestion, 
two rounds of restriction— selection were necessary to reduce the 
background of surviving uncut wild-type plasmid below the mu- 
tant plasmid (Table I). The background of surviving plasmid from 
wild-type DNA probably represents the sum total of spontaneous 
mutations and residual undigested wild-type plasmid. Subtract- 
ing the frequency for unmutagenized DNA (background) from 
the frequency for mutant DNA, and normalizing for the window 
of mutagenesis sampled by a given restriction analysis (4—6 bp) 
provides a rough estimate of the mutagenesis efficiency over the 
entire subtilisin coding sequence (—1000 bp). 

From this analysis, the average percentage of subtilisin gene 
inserts containing mutations that resulted from dGTPas, dCTPas 
or dTTPas misincorporation was estimated to be 90, 70 and 20 % , 
respectively. These high mutagenesis frequencies were generally 
quite variable depending upon the dNTPas and misincorporation 
efficiencies at this site (Table I). For instance, misincorporation 
efficiency has been reported to be both dependent on the kind 
of mismatch, and the context of primer (Champoux, 1984; Skin- 
ner and Eperon, 1986). Furthermore, biased misincorporation 
efficiency of dGTPas and dCTPas over dTTPas has been pre- 
viously observed (Shortle and Lin, 1985). Unlike the dGTPas, 
dCTPas and dTTPas libraries, the efficiency of mutagenesis for 



321 



B.C. Cunningham and J. A. Wells 



Table I. Estimation of mutagenesis frequencies by restriction-site selection 8 



dNTPas 
misincorporated b 



Restriction site 
mutated 



% resistant c!ones c 



None 
G 
T 
C 

None 
G 
T 
C 

None 
G 
T 
C 

None 
G 
T 
C 



Pstl 
Pstl 
Pstl 
Pstl 

Clal 
Clal 
Clal 
Clal 

PvuU 
PvuU 
PvuU. 
PvuTL 

Kpnl 
Kpnl 
Kpnl 
Kpnl 



1st round 


2nd round 


Total 


over : 


0.32 


0.7 


0.002 


o 


0.33 


1.0 


0.003 


0.001 




<0.5 


<0.0Q2 


0 




3.0 


0.013 


0.001 


0.28 


5 


0.014 


o 


2.26 


85 


1.92 


1.91 


0.48 


31 


0.15 


0.14 


U.DD 


15 


0.08 


0.066 




29 


0.023 


0 


0.41 


90- 


0.37 


0.35 


0.10 


67 


0.067 


0.044 


0.76 


53 


0.40 


0.38 


0.41 


3 


0.012 


0 


0.98 


35' 


0.34 


0.33 


0.36 


15 


0.054 


0.042 


1.47 


26 


0.38 


0.37 



% resistant clones 



% mutants 
^ 1000 bp' 



0.2 

0 

3 



380 
35 
17 



88 
9 
95 



83 
8 
93 
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the dATPas misincorporation library could not be accurately 
assessed because 90% of the restriction-resistant plasmids ana- 
lyzed simply lacked the subtilisin gene insert: The basis of this 
result is not clearly understood. However, correcting for 
This background, we estimate the mutagenesis frequency in the 
dATPas misincorporation library at around 20%. In a separate 
eX ^^^- (not ^0^). the mutagenesis efficiencies for dGTPas 
j oI5 ^mcorporation were estimated to be around 50 
and 30%, respectively, based on the frequency of reversion of 
an inactivating mutation at codon 169. 

*, 1 J e T iS?f doQ ° f a mutation was ^dily determined by single 
track DNA sequencing over the entire gene that corresponded 
to the misincorporated a-thiodeoxynucleotide. The identity of the 
mutation was determined by four track DNA sequencing focused 
over the site of mutation. Of 14 mutants identified, the distri- 
bution was similar to that reported by Shortle and Lin (1985) 
except that we did not observe nucleotide insertion or deletion 
mutations. There was a bias for A - G mutations in the G mis- 
mcorporation library, and some unexpected point mutations ap- 
peared in the dTTPas and dCTPas libraries. 

ri/S«"^ andidenti fi cation of alkaline stability mutants ofsub- 

Two problems were posed by screening colonies under high 
ajahne conditions ( > P H 1 1). Because B. subtilis will not grow 
athigh pH, it was necessary to grow colonies on filters at neutral 
P H to produce subtilisin, and subsequently transfer fflters to skim 
plates at pH 11.5 to assay subtilisin activity (Wells et ai, 

«1%TT T ' * pH 11 - 5 ±& casein mfc ^« no longer formed 
a turbid background which prevented a clear observation of 
proteolysis halos. This problem was overcome by briefly staining 
322 



the plate with Coomassie brilliant blue (R-250) to amplify proteo- 
lysis zones and acidifying the plates to develop casein micelle 
turbidity. By comparing the halo size produced on the reference 
growth plate (pH 7.0) with the high pH plate (pH 11.5), it was 
possible to identify mutant subtilisins that had increased (positives) 
or decreased (negatives) stabilities under alkaline conditions (not 
shown). 

Approximately 1000 colonies were screened from each of the 
four misincorporation libraries. The percentages of colonies 
showing a differential loss of activity at pH 11.5 versus pH 7 
were 1.4, 1.8, 1.4 and 0.6% of the colonies that expressed sub- 
tilisin activity at neutral pH from the dGTPas, dATPas, dTTPas 
and dCTPas libraries, respectively. Several of these' negative 
clones were sequenced and all were found to contain a single 
base change as expected from the misincorporation library from 
which they came. Negative mutants included Asp36Ala, 
Iysl70Glu and Met50Val (the wild-type amino acid is followed 
by the codon position and the mutant amino acid). Two positive 
mutants were identified as Ilel07Val and Lys213Arg. The ratio 
of negatives to positives was roughly 50:1. 

Stability and activity of subtilisin mutants at alkaline pH 
Subtilisin mutants were purified and their autolytic stabilities were 
measured by the time course of inactivation at pH 12.0 (Figure 
2, Table U). At the termination of each autolysis study, SDS- 
PAGE analysis confirmed (by disappearance of the 27.5 kd sub- 
tilisin band) that the subtilisin variant had autolyzed to an extent 
consistent with the remaining enzyme activity. Positive mutants 
identified from the screen (Ebl07Val and Lys213Arg) were more 
resistant to alkaline induced autolytic inactivation compared with 
wild-type; negative mutants (Lysl70Glu and Met50Val) were less 
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Fig. 2. Autolytic stability of purified wild-type and mutant subtilisins (200 jig/ml) in 100 mM potassium phosphate pH 12 at 25°C. After incubation for 
indicated times, residual enzyme activity was measured as described in Materials and methods. Panel A: wild-type subtilisin (■—■)• Ilel07Val (□—□)• 
Lys213Arg (A A); Lysl70Ghi (o-o); Elel07Val/LYs213Arg (x-x). Panel B: wild-type subtilisin (■-■); Met50Val (□-□); Met50Phe (A— AV ' 
Met50Phe/nel07Val/Lys213Arg (x-x). \ J> 



resistant. A MetSOPhe mutant produced by site-directed muta- 
genesis (D.Powers and J.A.W., unpublished) was more stable 
than wild-type enzyme to alkaline autolytic inactivation (Figure 
2B). * / 

The stabilizing effects of Ilel07Val, Lys213Arg and MetSOPhe 
were cumulative as determined from rates of alkaline autolysis 
(Figure 2). The double mutant, Ilel07Val/Lys213Arg, was more 
stable than either single mutant. The triple mutant, MetSOPhe/ 
Hel07Val/Lys213Arg, was more stable than the double mutant 
or MetSOPhe. The inactivation curves showed a biphasic character 
that became more pronounced the more stable the mutant ana- 
lyzed. This may have resulted from some destabilizing chemical 
modification(s) (e.g. deamidation) during the autolysis study 
and/or reduced stabilization caused by complete digestion of 
larger autolysis peptides. These alkaline autolysis studies have 
been repeated on separately purified enzyme batches with essen- 
tially the same results. 

Rates of autolysis may depend both on the specific activity as 
well as the conformational stability of the subtilisin variant (WeDs 
and Powers, 1986). It was therefore possible that the decrease 
in autolytic inactivation rates resulted from decreases in specific 
activity of the more apparently stable mutants under alkaline con- 
ditions. However, the more stable mutants, if anything, had 
relatively higher specific activities than wild-type under alkaline 
conditions and the less stable mutants have relatively lower 
specific activities (Table II). These subtle effects on specific ac- 
tivity for nel07Val/Lys213Arg and Met50Phe/IlelO7Val/ 
Lys213Arg are cumulative at both pH 8.6 and 10.8. The changes 
in specific activity may reflect slight differences in substrate 



specificity, although only positions 170 and 107 are near the 
substrate binding site (Robertus et al. y 1972). 

Discussion 

Variable polymerase extension from a fixed primer permitted syn- 
thesis in vitro of a uniform set of 3 '-terminated ends onto which 
misincorporation events could be focused on the subtilisin gene. 
This approach is in contrast to whole plasmid mutagenesis, where 
a single-stranded nick is produced randomly by treatment with 
DNase I in the presence of ethidium bromide, and the 3' ter- 



Table U. Relationship between relative specific activity at pH 8.6 or 10.8 
and alkaline autolytic stability 



Enzyme 





pH 


8.6 




pH 


10.8 


half-time (min) b 


Wild-type 


100 


± 


1 


100 


± 3 




LysHOGlu 


46 




1 


28 


± 2 


13 


Eel07Val 


126 




3 


99 


± 5 


102 


Lys213Arg 


97 




1 


102 


± 1 


115 


nel07Val/Lys213Arg 


116 


± 


2 


106 


± 3 


130 


MetSOVai 


66 




4 


61 


± 1 


58 


MetSOPhe 


123 




3 


157 


± 7 


131 


Met50Phe/Hel07Val/Lys213Arg 


126 




2 


152 


± 3 


168 



a Relative sp. act. was the average from triplicate activity determinations 
normalized to the wild-type subtilisin value at the same pH. The average sp. 
act. of wild-type enzyme at pH 8.6 and 10.8 was 70 /xmol/min/mg and 
37 /tmol/min/mg, respectively. 

^Time to reach 50% activity was taken from Figure 2. 
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Fig. 3. Stereoview of the a-carbon diagram of subtiUsin showing the positions of more alkaline stable mutants CMetSOPhe, Hel07Val and Lvs2l3Arf^ 1p« 
alkaline stable mutants (Met50Val and Lysl70Glu) and the catalytic Ser221. * 



minus is exposed for misincorporation by limited digestion with 
exonuclease m (Shortle et al, 1982). The former method is 
necessarily more regionally efficient because the misincorporation 
events are concentrated on the segment of interest instead of being 
diluted over the entire plasmid. Furthermore, the wild-type back- 
ground arising from plasmid that is not efficiently nicked with 
DNase I is avoided by starting with a single-stranded template 
such as M13. Other groups have utilized 0X174 (Zakour et al , 

1984) or M13 (Champoux, 1984; Kunkel, 1985; Skinner and 
Eperon, 1986; Singh et al., 1986) templates to produce mutations 
by misincorporation immediately behind or by fixed extension ' 
from a synthetic oligonucleotide primer instead of behind a ran- 
domly extended primer as employed here. The use of enrich- 
ment procedures such as deoxyuridine containing DNA (Kunkel, 

1985) , in vitro methylation (Pukkila et al, 1983; Horton and 
Lord, 1986) and restriction-primer selection produced high 
frequencies of mutagenesis over the subtilisin gene fragment, 
( ~ 1 kb). 

Uncoupling the growth of cells from the high pH screen cir- 
cumvented the problem that the B. subtilis cells will not grow 
at pH 11.5. Use of a general protease substrate casein, instead 
of a specific synthetic substrate, avoided selecting effects of pH 
on substrate specificity. In principle, the alkaline screen should 
be capable of identifying mutants having both greater alkaline 
activity as well as alkaline stability. In fact, the mutants ident- 
ified as positive were all more stable and somewhat more alkaline 
active than wild-type. This may explain why the zones of proteo- 
lysis under alkaline conditions (not shown) for the positive mu- 
tants appeared more pronounced relative to wild-type than 
expected from the autolytic stability under alkaline conditions 
measured in vitro. 

Although the number of mutants characterized is too small to 
generalize, a number of points are noteworthy. All mutants de- 
scribed are located on, or are accessible to, the surface of the 
molecule (Figure 3). Alber et al (1987) have shown that ran- 
dom mutations in buried positions of T4 lysozyme are more likely 
to destabilize the protein compared to mutations at surface- 
accessible sites. We. probably missed mutant enzymes which 
were extremely unstable, because we chose to screen only col- 
onies expressing active subtilisins at neutral pH. Position 50 is 
located at the end ofa 0 sheet structure. The Met side chain makes 
van der Waals contact with hydrophobic side chains of Trpll3, 
Val93 and Val95 (R.Bott and M.Ultsch, unpublished results)! 
Modelling of the Met50Val substitution shows a ^-branched 
constituent would make unfavourable steric contacts with main 
chain carbonyl oxygens at positions 94 and/or 106. Model build- 
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' ing of a Met50Phe substitution shows the Phe side chain can make 
a potentially more favourable hydrophobic interaction with 
Trpll3. Such an interaction is observed between Phe50 and 
Trpl 13 in the X-ray crystal structure of the B. licheniformis sub- 
tilisin (McPhalen et al., 1985). Improved packing interactions 
between aromatic side chains can confer added stabilization to 
proteins (Burley and Petsko, 1985). 

Position 170 is located in a loop that forms part of the PI 
binding site of subtilisin (Robertus et al. 9 1972). The Lys side 
chain is close to Serl65, Tyrl67, and is 4 A from Glul95 (NZ- 
170 to OE2-195). The Lysl70Glu mutant probably causes un- 
favorable electrostatic effects at high pH where the neighbouring 
side chains from Glul95 and Tyrl67 would also be negatively 
charged. Ilel07 is an a-helix that comprises part of the P4 bind- 
ing site (Robertus et al., 1972). The 5-methyl group is not in 
contact with other atoms from the enzyme and it is unclear why 
a Val substitution should marginally improve alkaline autolytic 
stability. Lys213 is near the end of a 0-ribbon (Figure 3). The 
side chain is accessible to solvent and the X-ray structure indicates 
that it does not make any intramolecular electrostatic interactions. 
Models of Lys213Arg do not suggest the possibility of new elec- 
trostatic contacts. It has been shown that enzymes can be stabiliz- 
ed by guanidination of lysine side chains (Cupo et al., 1980). 
In addition, we expect that at pH 12 the Lys side chain would 
be neutral whereas Arg would be at least partially positively 
charged (Tanford, 1962). 

Autolysis of subtilisin is promoted by agents which disrupt the 
conformational integrity of the molecule such as high temperature, 
denaturants, high pH, and chelants which remove stabilizing 
calcium ions (Ottesen and Svendsen, 1970; Voordouw et al , 
1976; Wells and Powers, 1986). It is possible that mutations 
which stabilize the molecule to alkaline autolytic inactivation may 
not effect thermal autolytic inactivation. In fact, preliminary ex- 
periments indicate that the alkaline stable triple mutant, 
Met50Phe/Ilel07Val/Lys213Arg, is not significantly more 
autolytically stable than wild-type subtilisin at 55° C in 50 mM 
MOPS (pH 7), 50 mM NaCl and 2 mM CaQ 2 (B.C.C. and 
J.A.W., unpublished).. 

The single positive mutants identified are not more dramatically 
stable than the wild-type. However, the fact that the stabilizing 
effects from these positions are cumulative permits large improve- 
ments in alkaline autolytic stability to be achieved. In addition, 
the stabilizing mutation (Met50Phe) is at a site where a destabiliz- 
ing mutation has been identified from the random mutagenesis 
(Met50Val). The Met50Phe substitution is found in other bacillus 
subtilisins sequenced (Markland and Smith, 1971; Nedkov et al, 



1983; Stahl and Ferrari, 1984). Thus, a strategy to improve 
Valine autolytic stability further can involve producing substi- 
tutions (especially variant substitutions from natural extremo- 
philes) at sites found by random mutagenesis to be alkaline (and 
perhaps thermally) sensitive such as was done at position 50. 
While this manuscript was in press, a more thermally stable 
variant of subtilisin from B, amyloliquefaciens (Asn218Ser) was 
isolated by random mutagenesis (Bryan et al, 1987). A serine 
at position 218 is found in natural variant subtilisins from thermo- 
philic sources. These studies further demonstrate the usefulness 
0 f the random mutagenesis and screening approach for improv- 
ing the range and utility of enzymes for industrial purposes. 
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amylase A and a consensus alignment of mammalian, plant, 
and bacterial a-amylases. The location of mutant amino acids 
on the model indicate that mutations which destroy or 
decrease the catalytic activity are particularly clustered: 
(i) around the active site and along the substrate-binding 
groove and (ii) in the interface between thexentral oti '13 barrel 
and the C-terminal domain. Exposed loops are typically 
tolerant towards mutations. 

Key words: homology modelling/sequence alignment/starch 



Introduction 

a-Amylases (a-l,4-glucan-4-glucanohydrolase, EC 3.2.1.1) are 
widely distributed starch-degrading enzymes found in plants, 
animals and bacteria. Because of the importance of starch as a 
raw material in a number of industries, particularly the food 
industries, a-amylase is an enzyme of considerable commercial 
significance. Despite this, there is relatively little information 
concerning either structure or function of this interesting family 
of enzymes. It is a common observation that even the availability 
of a good three-dimensional structure does not necessarily make 
it possible to predict the effects of specific amino acid changes, 
even in the case where information about the enzyme mechanism 
is available (Knowles, 1988). For this reason we have developed 
a novel random mutagenesis method (Lehtovaara et al , 1988). 
The method permits the enzymatic generation of a library of 
single (and multiple) base mutations throughout a defined region 
of DNA which can be several kilobases long. This method has 
now been applied to the cloned gene of Bacillus stearothermo- 
philus a-amylase. A large number of mutations have been 
identified by sequencing in the course of refining the 
methodology. 

The three-dimensional structures of two a-amylases, one from 
Aspergillus oryzae, the so-called Taka-amylase A, and the other 
from pig pancreas, have been determined by X-ray 
crystallography (Matsuura et al, 1984; Buisson et al , 1988). 
Both a-amylases have three domains: A, B and C. Overall, their 
structures appear very similar, but some differences are observed 
in the structure of domain B and in the orientation of domain 
C relative to domain A (Buisson et al , 1988). The N-terminal 
domain A corresponds to the well-known (a//?) 8 barrel motif, 
which was first seen in triose phosphate isomerase (Phillips et al , 
1978). In the middle of domain A, the chain loops out to make 



domain B, which forms a lid above the barrel. The active site 
is located at the top of the barrel on the C-terminal side of the 
parallel /?-sheet. Substrate analogues bind in this site (Matsuura 
et al , 1984; Buisson et al , 1988), and there are three carboxylic 
acids around this site which are conserved in all known a-amylase 
sequences. In B. stearothermophilus a-amylase they correspond 
to residues D234, E264 and D331. Matsuura etal (1984) 
propose that residues E231 and D297' of Taka-amylase A, 
corresponding to D331 and E264 of B. stearothermophilus, are 
the catalytic residues analogously to the mechanism in lysozyme. 
Buisson et al (1988) propose that residues D196 and D300 of 
porcine pancreatic a-amylase, corresponding to D234 and D33 1 
of B. stearothermophilus, are catalytic. A binding site for calcium, 
which is essential for activity (Steer and Levitzky, 1973), is 
located between domains A and B. The C T terminal region is 
folded into an antiparallel 0-barrel and forms domain C. Although 
it has proved possible to crystallize the enzyme from B. stearo- 
thermophilus (Ogasahara et al } 1970), X-ray structures for the 
bacterial a-amylases are not yet available. 

It has been shown that tertiary structures are better conserved 
in evolution than amino acid sequences (Chothia and Lesk, 1986). 
Computer-aided molecular modelling has been used in a number . 
of cases to construct a model of a protein of unknown structure 
which is related to a protein with a known structure [for a review 
see Blundell et al (1987)]. a-Amylase genes from a variety of 
sources have been sequenced. Limited inter-species homology 
has been reported to be located in short stretches around the active 
site (Svensson, 1988). Using a sensitive method, we have found 
sufficient homology over almost all of the amino acid sequence 
which justifies homology modelling of the B. stearothermophilus 
sequence onto the Taka-amylase A structure. It is known that 
the three-dimensional structures of porcine pancreatic a-amylase 
and Taka-amylase A are similar. We show that the sequence 
homology between B. stearothermophilus and Taka-amylase A 
is more significant than between the former pair. The Taka- 
amylase A structure was chosen because it is currently the only 
publicly available a-amylase structure. 

In this paper, we present a three-dimensional structural model 
of B. stearothermophilus a-amylase based on the structure of 
Taka-amylase A and show that the properties of the random 
mutants can be consistently interpreted with the aid of this model. 

Materials and methods 

Generation of point mutations 

The a-amylase gene from B. stearothermophilus ATCC 12980 
was cloned as a 1.9 kb long fragment to the phage vectors 
M13mpl8 (coding strand) and M13mpl9 (non-coding strand). 
The insert contains the complete coding region of the a-amylase 
gene (1650 bp). Random enzymatic mutagenesis of the cloned 
DNA was performed according to the method of Lehtovaara 
et al (1988). The method can be summarized as follows. Starting 
from oligonucleotide primers, a population of molecules is 
synthesized under conditions where one nucleotide is limiting. 
This gives a population of molecules each of which ends just 
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Mutations that cover the sequence of Bacillus stearothermo- 
philus a-amylase were produced by an efficient in vitro 
enzymatic random mutagenesis method and the mutant 
a-amylases were expressed in Escherichia coli, which also 
secreted the product. Ninety-eight mutants were identified by 
sequencing and their enzyme activities were classified into 
three classes: wild-type, reduced or null .j A molecular model 
of the enzyme was constructed using the coordinates of Taka- 
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before the omitted base. After removal of all nucleotides, 
misincorporation with reverse transcriptase is carried out under 
optimized conditions to give, in principle, a 1:1:1 ratio of the 
incorrect bases. Following synthesis of the second strand, the 
mutated dsDNA is transfected into Escherichia coli. The mutant 
library was generated in the course of developing the random 
mutagenesis method. Five oligonucleotides (20-mers) at —200 
nucleotide intervals were used to mutagenize the A nucleotides 
in the coding region of a-amylase in the M13mpl9 construction. 
With one of these primers the template nucleotides G and C were 
also mutagenized. Only one primer was used to mutagenize T 
and G nucleotides in the M13mpl8 construction (covering amino 
acids H208-L248). 

Characterization of mutants 

A suitable dut + ung + strain of E.coli, such as JM109 or TG2, 
was transfected with the mutagenized DNA and the cells were 
plated on glucose minimal medium/H-top amylopectin azure 
plates. Expression of functional recombinant a-amylase in E. coli 
was detected as halos around the plaques. A three-grade scale 
was used to assay the a-amylase activity as follows: 4-4- for 
wild-type-size halos, 4- for smaller but visible halos and — for 
non-detectable halos. Mutations were identified by DNA 
sequencing. 

Atomic coordinates for the starting model 
The atomic coordinates for the starting model were from the 
structure of Taka-amylase A at 3.0 A resolution [Matsuura et al. , 
1984; available from the Protein Data Bank (Bernstein et al , 
.1977) as entry 2TAA]. 

Sequence alignment 

Because of a high level of noise, automatic Needleman— Wunsch 
(1970) procedures are not applicable to the alignment of sequences 
-that are only - 20% identical, which is the case for most sequence 
pairs in our sample. The sensitive alignment program of Argos 
(1987) was used for initial pairwise alignments. Discrepancies 
between different pairwise alignments were resolved by visual 
pattern matching to produce a consensus alignment of the 
sequences. The positions of a helices and 0 strands were taken 
for pig pancreatic a-amylase from Buisson et al (1988), and for 
Taka-amylase A they were calculated from the coordinates using 
the DSSP program (Kabsch and Sander, 1983). Helices and 
strands of pig pancreatic a-amylase and Taka-amylase A were 
matched. Gaps were introduced to maximize similarity across 
all sequences according to the following conservative substitution 
groups: {P, GJ, {S, T}, [E, D, N, QJ, (H, Y, F, W), (A, I, L, 
V, M, C), [K, R}. At positions corresponding to a known helix 
or strand, however, gaps were not allowed in any sequence. The 
significance of pairwise alignments was assessed by the correla- 
tion of five physical characteristics of amino acids (hydropho- 
bicity, bulkiness, turn preference, strand preference, refractivity 
index) as given by Argos (1987), This measure proved quite 
sensitive to small changes such as the movement of just a single 
residue in a gap. Argos (1987) recommends that alignments 
where the average correlation is at >2.0 SD above random be 
considered seriously. Adjustments were made to the alignment 
until the value was about this threshold for the majority of 
pairwise alignments. 

Model building 

It was assumed that the backbone structure in aligned positions 
is the same between Taka-amylase A and B.stearothermophilus 
a-amylase. Loops involving gaps were patched using fragments 
selected from the database of known structures (Jones and Thirup, 



1986). One to three C(a) atoms at both ends of the loop were 
matched to the template (derived from Taka-amylase A). 
Different anchors were tried out until a fragment was found where 
the fit to C(a)s was good (deviation of superposed fragment to 
anchor atoms <1 A). Ideally, all main chain atoms of the 
fragment and of the starting structure would overlap, but in 
practice the overlap could be restricted to just one residue at both 
ends. Main chain coordinates were replaced by those of the loop 
starting from the second residue in the N-side anchor region and 
ending with the second-last residue in the C-side anchor region. 
Graphics was used to make sure that the selected loops packed 
tightly to the rest of the structure. Structures that would clash 
with the rest of the molecule were rejected already when they 
were collected from the database. Side chain coordinates for 
aligned identical residues were copied from the Taka-amylase 
A structure, otherwise they were inserted in standard con- 
formation. The conformation of F346 and W492 was changed 
to another staggered conformation because they initially clashed 
with the main chain. 

Strain introduced to the model through sidechain substitutions 
and joining of loops was relaxed by the program CHARMm 
(Brooks et al. , 1983) using an adapted basis Newton— Raphson 
energy minimization with a distance-dependent dielectric constant 
and an 8 A cut off for non-bonded interactions. The calcium ion 
was not included in the minimization. 

A chain of six glucose units joined by a-l,4-linkages was built 
in the interactive molecular modelling program Quanta (Polygen 
Inc.)- The dihedral angles in the aldose linkage were -1-90 and 
-31° for 05-C1-04-C4 and C1-04-C4-C3 respectively. The 
interaction energy of a water probe to protein was calculated in 
Quanta. Energy countours at an arbitrarily chosen level of 
—2.0 kcal/mol formed a tube which passed by the catalytic 
residues, and the glucose chain was fitted manually to it. Rotation 
of one glycosidic bond was necessary to fit the chain in contact 
with the catalytic residues (D234, E264, D331). The substrate 
was relaxed by energy minimization keeping the protein part 
frozen. 

Results 

Alignment of a-amylase sequences 

The alignment of distantly related a-amylase sequences is far 
from obvious because only six blocks of residues are clearly 
conserved in them (Svensson, 1988). However, when the 
correlation of physical properties of amino acids is used as a 
measure, an acceptable alignment in terms of standard deviations 
above random sequence comparison scores can be made over 
the whole length of the sequences (Table I). The alignment in 
Figure 1 shows that the best conserved regions are in domains 
A and C, and these are segments around the catalytic residues 
and secondary structure elements where particularly hydrophobic 
patches are conserved. Although domain C lacks invariant 
residues, the patterns of alternating hydrophobic and hydrophilic 
residues are preserved in the /? strands. 

In the aligned a-amylase sequences (Figure 1) there are 31 
positions where the nature of the amino acid is conserved. Of 
these, 13 positions require hydrophobic residues, three are 
invariantiy glycine, three require an aromatic residue and seven 
require an invariant charged residue. 

In domain B there is very little amino acid homology. Further, 
the arrangement of /3 strands in the two known structures, porcine 
pancreatic a-amylase and Taka-amylase A, is distinctly different 
(Buisson etal t 1988). In addition, in the amyloliquefying 
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Table I. Quality of the alignment 



Sequence 



1 


2 


3 


4 


5 


6 




8.4 


3.9 


1.4 


0.7 


3.0 


38 




5.0 


1.2 


1.4 


2.8 


24 


24 




2.3 


2.1 


2.7 


21 


18 


21 






J.J 


19 


10 


21 


28 




14.5 


22 


• 23 


20 


27 


52 




20 


16 


19 


22 


23 


24 


18 


16 


21 


20 


23 


21 


19 


17 


21 


21 


23 


23 


17 


17 


18 


18 


19 


20 



7 


8 


9 


10 


0.3 


1.3 


1.4 


-0.1 


1.6 


2.1 


2.3 


1.7 


0.5 


1.5 


2.1 


1.8 


1.3 


1.3 


1.5 


0.8 


3.0 


3.3 


3.5 


1.1 


3.7 


3.2 


3.4 


1.0 




17.7 


17.1 


2.2 


66 




20.3 


3.2 


66 


81 




3.7 


24 


25 


26 





1 S.hygroscopicus 

2 Pig pancreas 

3 B.subtitis 

4 B. circulans 

5 S.fibuligera 

6 Taka-amylase 

7 B.stearothermophilus 

8 B. licheniformis 

9 B. amyloliquefaciens 
10 Barley 



n 98^ T? ttH^VT random for the alignment in Figure 1 over five parameters as given by the sensitive homology program of Argos 

(1987) If 2.5 standard deviations is used as a criterion, all but barley are satisfactorily aligned with Taka-amylase A. The structure of Taka-amylase A (Asperjha 
tT^Il^ " " *V f ° r T™ 1 m ° dd °/ Z ^o^rmophilus. Lower p^rt of the table: percentage of identical ^^Tl^tpo^ 
JldriS^S^^ 3 ^ m SeqUCnCe a COm ^ aiison - P° sitions account for at ^ the length of any sequence in any 



g. 1- 



< domain A 

1 - - -MQQRSR VL GGTLAG I VAAAAATVAPWPSQATPPGQICTV- TATLFERKY 



2 
3 
4 

5 ■ 

6 • 

7 - 

8 - 

9 - 
10 - 



VDVAKACTDQLGPAGYGYVEVSPASEH IQGGQ- - - 

' " QYAPQTQSGRTD- IVHLFEWRW- VP I ALECER Y LGP<GFGGVQ\/^PPWPMw\/t>jpcd 

PAAASAETANKSNELTAPSIKSGTUHAWHWSF NTLKHNMKAI HDA-GYTAl'oTSPI NQVKEGNGGDK 

LIALLAAIAFGSVAPAEAAPATSVSNKQNFSTDVIYQIVTDRFV--DGNTANHPAGSAYDATCSTNLKLYCGGDWQGIH 
-KAALLASLAALvTAQPVTLFKRETHAOICWRSQSIYQIvTDRFARTDGOTSA SCNTEDRLYCGWFoin^K^ 



-ATPADURSQS IYFLL TDRFARTDGSTTA- 



10 



"AAPFNGTMMQYFEVYLP- 
- -- AN LNGTLMQ Y FEVYMP - 

VNGTLMQYFEUYTP- 

-GLASGHQVLFQGFNUES- - 



- TCMTADQKYCGGTWQGI I DKL DYIQGH - - -GFTAIWTTPVTAQLPQDCA - Y 

20 30 40 50 
DOGTLUTKVAMEAMNLSSL — GITALWLPPAYKGTSRSDVGY 



-NDGQHUKRIQNDSAYLAEH - - - GI TAVWI PPAYKGTSQADVGY 
-NDGQHWKRLQNDAEHLSD I - - - GI TAVWI PPA YKGLSQSDNGY 



" WKGSGGWYMMMMGKVDD I AAA- - - GVTH VWLPPPSHSVSNE - - GY 
+ •*■ *++ *+++++ 



1 - 

2 P 



- - domain A >< domain B - 

-WWTSYQPVS--YKIAGRL--GDROAFASMVSACHAAGVXVIAOAVVNHHAAGSGRHHAQYT)CYNY PGFYQDQTFHG-CRKS 1 SDYTNR - 



WWERYQPVS"YKLCTRS"GNEHEFRDMVTRCMHVGVR I YvT>AVINHHCGSGAA AGTGT TCGSYCMPGWREFPAVPY^AUnFMnr:ifriirTfl<?r:r:T gcvunp 

3 S- "HSNWWLYQPTS- -YQIGNRY-LGTEQEFKEMCAAAEEYGIKVIVDAVINHTTSDYAAISNEVXSI PNWTHGNTQ IKNWSOR 

4 SGVNNTAYHGYWARDF-KICTNPA- • FGSMTD FANL I SAAHSRN I KWID FAPNHTS PAMETNASFGENG KLYDNG7LLG GYTGOT 

5 GY----AYHGYWMKNI -YJCINEN- - FGTADDLJCSLAQELHDRDMLLMVDIVTNHYGSDGSGDSIDYSEY TPFNDQICYFHNYCL ISNYD0Q 

6 GD----AYTGYWQTDI-YSLWEM--YGTAD DLKALSSALHER G MYLMVDVV ASHMGYDGAGSSVDYSVF KPFSSGDYFHPFCF IQNYEDQ 



60 



70 



80 



90 



100 



110 



120 



130 



140 



150 



7 GV YDLYDLGEFNQKGAVRTKYGTKAQYLQAIQAAHAAGMQVYADWFDHICGGAOGTEWVDAVEV NPSDRNQE I SGTYQ IQAWTICFDFPGRGNTYSSFJC 

8 " YDLYDLGEFHQJCGTVRTKYGTKGELQSAIKSLHSRDINVYGOWINHICGGADATEDVTAVEV DPADRNRVl'sGEHL IKAWTHFHFPGRGSTYSDFK 

9 GP- 



-YDLYDLGEFQQKGTVRTICYGTKSELQDAIGSLHSRNVQVYGOWLNHICAGADATEDVTAVEV NPAXRKQETSEEYQ IKAWTDFRFPGRGNTYSDFK 



10 MP- 



-GRLYD I - 



-DASKYGNAAELKSLI GALHGJCGVQAIADI VINHRCAOY-KDSRGIYCI FEGGTS0 GRLDUGPHH ICRDOTKYSDGTANL- 

*+ ++ *+ * + + * #* 
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_ domain B >< domain A 

1 - DDVQTC ELVDLADLGTGSDYVRTTIAGYLGLRSL- -GVDGFRIDAAKH I SATD LAAVKGKMKD 

2 YQVRDC QLVGULOLALEI CDYVRSHI ADYLNKLID I - G VAGFRLD ASKH MU PGD IKA VLDKLHNLNTNWFPAGS 

3 UDVTQNS LLGLYDUNTQNTQVQSYLKRFLDRALND-GADGFRFDAAKHIELPDDGSYGSQFUPN I TNTSA 

4 NGYFHHNGG TDFSTLKNGI YCNLYDLADLNHNNSTIDTYFKHAIRLWLDM-G1DGIRVDAVKHHPFG WQKNWMSS IYSYK. 

5 AQVQSCUEGDSSVALPO LRTEDSOVASVFNSWVKD FVGNYSIDGLRIDSAKHVDQG FFPD FVSASG - 

6 TQVEDCWLGDHTVSLPDLDTTKD WKWEWYDWVGSLV SWYSID GLRID TVICHVQICD FtfPGYNKAA G - 



160 1 70 180 190 200 210 220 230 240 

7 WRV^HFDGVDVTOESRKLSRI YKFRGIGKAWOWEVDTENGHYDYLMYADLOMDHPEWTELICSWGKWYVHTNI LDGFRLDAVKHIICFS- 

8 WHWYHFDGTOUDESRKLHRI YKFQ--GKAWDWEVSNENGNYDYLHYAO IDYDHPDVAAEIICRWGTUYANELQLDGFRLDAVKHIKFS- 
: : : : : : : : ::.:: ::::::::: ::::::::::::.:::::::.:: :::::: 

9 WHWYHFDGADUDESRKISRI FKFRGEGKAUDWEVSSENGMYDYLMYADVDYDHPDVVAETKKWGIWYANELSLDGFRIDAAlCHIICFS- 
10 



-DTGADFAAAPO IDHLNDRVQRELKEWLLWLKSOLGFDAURLDFARG- 
+ + *+ * ++ ++++*+* +*+ 



250 

- FFPDULSY- 



-FLRDWVNH 

- FIRDWVQA 

-YSPEMAKVYIDG- 



- VRSQTGK 
-VREKTGK 
- VRQATGK 
TS 



domain A 

-YTGIGO VDEFRYGT--HLKSAFQSGN IAQIKSVADGKLW- 



1 PG FWVQE V I YGAGEAVRPOE ■ 

2 RPFI FQEVIDLGGE AIICSGE YFSNGR VTEFKYGAKL GTWRKWSGEK MSYLKNWGEGWGF 

3 -EFQYGEI LQ--DSASRDAA YAMYMD VTASNYGHSIRSALKNRML GVSNI SKYASDVSA 

4 PVFTFGEWFL--GTNETDAN NTYFANESG-MS LID FRFSQKVRQVFRDGSOTMYGLDSMLSSTAAO YYSVNDQVTFLDNHDMDRF 

5 - VY SVG E V FQ - - GO P A YT CP YQNYIPG VSN YPLYYPTTRFFCTTDSSSSELTQHISSVASS CSOPTLLTMFVEMHDNERF 

6 -VYCI GEVLD- -GP PAYTCP YQNVMDG VLN YPrYYPLLNAFKSTSGSHDDLYMMINTVKSO CPDSTLLGTFVENHDNPRF 



— QRQARTFVDNWDTERN GSTLT 

MPSDR AIVFVD NHDKQRGHGAGGSS I IT 

- - -DKLVTWVESHOTYA NDDEEST 

Q 



-ASH 
--AS 



260 270 280 290 300 310 

7 PLFTVGEYWS--YD INKLHN YIMKTNGTMS LFDA PLHNKFYTASKS-GGTFDMRTLMTNTLMKD - 



320 330 
--QPTLAVTFVDNHDTEPG- 



340 
-QALQS 



8 EM FTVAEYWQ--NDL GALEN YLNKTNFNHS VFDV- 

9 EMFTVAEYWQ- -NNAG1CLEN YLNKTSFNOS- - -VFDV- 



-PLHYQFHAASTQ-GGGYDMRKLLNGTWSK- 
-PLHFWLQAASSQ-GGGYDMRRLLDGTWSR- 



-HPLKSVTFVDNHDTQPG QSLES 

-HPEKAVTFVENHDTQPG- - - - -OSLES 



10 PSLAVAEVWD- -MMATGGDGKPNYDQOAHRQMLVNWVDICVGGAASAGHVFDF TTKGI LNAAVEG ELURLIDPQGKAPGVHGWWPAKAATFVDNHDJ ' GSTQAHW 

+ + *++ * * + + ***^.'** 

- - domain A >< domain C 

1 YKDGAAYTLANVFMLASPY-GSPNVYSGYEWTDK-DAAAGGSTG-W TDD-AAKREITGMVGFRNAVGSAE- -LTNUU-- -DNGGRPLAFARSDK- - 

2 FUDA YRKLVAVGFMLA HPY-G FTRVHSS YRWARM- FVNGEDVND-W1GPPHMNGVI <EVTINADTTCGWOWVCEHRWREIRN HVWFRWVVD GEP-- FANWW- - - DNGSM QV AFGRGNR - - 

3 WMSDDD IRLGWAVI ASRS--GSTPLFFSRPEG GGMGVRFPGKSQIGDRGSALFEDQAITAVNRFHNVMAGQPEELSN- - PNGMNQ I FMNQRGS- - 

4 VSGANGRKLEQALALTLTSRGVPAIYYGTEQYWTGMGDPNNRA KMSS-FSTST-TAYNVISKLAPLRKSNPAIA-GTTQQR WINNDVYI YERKFGN- 

5 TSDQSLISNAIAFVLLGD- -G1PVIYYGQEQGLSGKSDTNNREALVL- SGYNKES-DYYKLIAKANAARNAAVYQDSSYATSQLSVI FSNDHVIATKRGS- - 

6 YTND IALAKNVAAFI IL ND-G LPII Y AGQEQHYAGGNOPANREATWL SGYPTDS- ELYKL I ASANA I RKYA I SKDTGFVTYKMP - Y I KDDTTI AMRKGT - - 



350 360 370 

7 UVOPUFKPLAYAF I LTRQE-GYPCVFYGDYYG I - 



380 390 400 

-PQYNIPS-LKSKIDPLLIARRDYAYGTQHDYLDHS- 



410 

-OIIGWTREGV-T 



8 TVQTWFKPLAYAFILTRES-GYPQVFYGDHYGT- 

9 TVQTWFJCPLAYAF I LTRES-GYPQVFYGDMYGT- 



-KGDSQREIPA-LKHKIEPILKARICQYAYGAQHDYFDHH- 
-ICGTSPKEIPS-LKDNIEPILICARKEYAYGPQHDYIDHD- 



-DIVGWTREGO-S 
-DVIGUTREGD-S 



10 PFPSDKVMQGYAYI LTHP--G1PC I FYDHFF- 
+++*+ * + *+ + 



-NWGFKDQIAALVA1RJCRNGIT ATSALKI LMHEGDAYVAE IOGK- - 

+++ * ++ + + + 



domain C > 



Fig. 1 (continued). 
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1 GFVAl NNG0AALTQTFA--TSLPAGTYCOW HAASSCOGD- -TVTVG-OTEA-QVDAAKS-VALHVGATGQSACRQAVALHVPGQSAGSPRSSAKRVEQ 

2 GFIVF K MDDWQLSSTLQ - - TGLPAGTY CDVI SGD KVGMSCT G 1 - - KVYVSSDG KA-QFSI SMSAEDPF I A I HAE SKL 

3 HGVVLA NAGSSSVSINTATKLPDGRY DNKAGAG S FQ VN - DGKL - TGT I NARSVAVL YPDD I AKAPHVF LE 

4 NVAVVAINKNLTSSYSIAGL-NTSLPSGTYT DVLANSLSGN- -SITVGSSGAVNTFTLQAGGEASGLTRRRQRLR 

5 WSVF NNLGSSGSSD--VTISNTGYS-"SGEDLVEVLTCS"TVSGSSDLQVS---!QGGQ--PQIFVPA--tCYASDrCS 

6 - -DGSQ IVTrLS -NKGASGDS YTLS LSGASYT - - - AGQQLTEV I GCT - - TVTVGSOGN V- PVPMA GGL- - PR VL YPTEKLAGSK I CSDSS 



420 430 440 

7 EKPGSGLAAL I TOGPGGSKWMYVGJC- 



450 460 470 430 490 500 510 

-QHAGICVFYDLTGNRSDTVTINSDGWG-EFKVNGGS- -VSWVPRKTTVSTIAUSITTRPVTDEFVRWTEPRLVAWP 



8 SVANSGLAALITDGPGGAKRMYVGR- 

9 SAA K S G LAA L I TD GP G G SKRH YAG L - 



-QNAGETWHD ITGNRSEPWINSEGWG-EFHVNGGS--VSI YVQR 
-KNAGETWYDITGHRSDTVKI GSDGUG-EFHVNOGS- -VSIYVGK 



10 



-WVK- 
+*+ 



- IGSRYDVGAVI - 



-PAG* FVTSAHGMDYAVU- 



-EJCNG--AAATLQRS 



Fig. 1. Sequence alignment of a-amylases from different sources. Aligned a-amylase amino acid sequences from: 1, Streptomyces hygroscopicus SF-1084 
strain AA69-4 (Hoshiko etal. y 1987); 2, pig pancreas (Pasero et al. t 1986); 3, Bacillus subtilis strain N7 (Yamane et al, 1984); 4, B.circulans (Nishizawa 
et al, 1987); 5, Saccharomyces fibuligera (Yamashita et al, 1985); 6, Aspergillus oryzae (Matsuura et al., 1984); 7, B. stearothermophilus', 8, BAicheniformis 
(Yuuki et al. , 1985); 9, B.amyloliquefaciens (Takkinen et al, 1983); 10, barley (Rogers, 1985). Five more mammalian a-amylase sequences are included in 
NBRF protein database (Barker et at, 1988). They have >80% identical residues with the pig pancreatic a-amylase and were therefore excluded from the 
alignment. Identical amino acids in adjacent sequences are marked by a colon. A dot marks changes within the following groups: {P, G}, {S, TJ, [E, D, N, 
Q), {H, Y, F, W], {A, I, L, V, M, C), [K, R). Gaps are denoted by — . An asterisk below the alignment marks positions conserved in all sequences, and a 
plus denotes positions where variability is limited to two different groups, a Helices and jff strands are underlined in the known structures. Gaps correspond to 
surface loops in the structure of Taka-amylase A. Residue numbers for B. stearothermophilus are shown above its sequence. 

Bacillus a-amylases, domain B contains a 43 —45 residues long 
insert relative to Taka-arxlylase A. For these reasons the proposed 
structure for domain B should be treated with considerable 
caution. 

Model building 

The alignment shown in Figure 1 was used to construct a three- 
dimensional model of B. stearothermophilus a-amylase using the 
structure of Taka-amylase A as a template (Figure 2). Due to 
the lack of a suitable structural template, two segments have been 
omitted from the model: the 45 -residue insert in domain B — the 
model joins residue 142 to residue 189— and the C-terminal 
extension of 19 residues (residues 497—515). Without these two 
segments, construction of the structural models was relatively 
straightforward since the remaining shorter insertions and 
deletions involve only a few residues. Insertions or deletions also 
occur in places where the polypeptide chain folds back on itself 
so that loops of different lengths are easy to accommodate. Thus 
the structural changes made during modelling were minimized. 
One difficult region was between amino acids 444 and 456, in 
which a loop between two /3 strands in the Taka-amylase A 
structure is shortened by five residues. This deletion could only 
be incorporated by using a long extended loop which caused 
concomitant restructuring of this region by displacing one jS 
strand. As a result, domain C appears more flat and open in our 
model than in Taka-amylase A. 

Mutant library 

Point mutations distributed throughout the or-amylase gene were 
generated by an enzymatic misincorporation method (Lehtovaara 
et al. , 1988). In the work presented here, mutants were identified 
by sequencing and subsequently assayed on starch plates for 
enzyme activity. In this sample of 98 phenotypic mutants, there 
are 31 which show no activity, 28 with decreased activity and 
39 with wild-type enzymatic activity (Table II). Seventy-five 
different residues are hit by at least one phenotypic mutation, 




Fig. 2. Structural model of B. stearothermophilus a-amylase based on the 
known structure of Taka-amylase A. Left: Taka-amylase A, right: model of 
B. stearothermophilus a-amylase. Only the N, C(a) and C atoms of the 
main chain are shown. The model was constructed based on sequence 
homology to Taka-amylase A, the structure of which has been determined 
by X-ray crystallography. A 45-residue insert from the back of .domain B 
and a 17-residue extension to the C terminus have been omitted. The light 
blue sphere shows the location of the essential calcium ion as found in 
Taka-amylase A. Red, green and yellow spheres mark the probable catalytic 
residues. Loops 17-38 and 333—350 of Taka-amylase A and the 
corresponding loops 15 — 19 and 370 - 374 of B. stearothermophilus 
a-amylase are coloured yellow. Domain B is coloured orange and domain C 
green in B. stearothermophilus a-amylase. Domain A (blue) is an (a/j3) 8 
barrel. 



and Figure 3 shows that they are distributed over most of the 
protein. Clusters of residues at which mutations affect activity 
are seen around the active site, and at the interface between 
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Table II. Mutants generated by the random mutagenesis method 




Activity 


Observed amino aeiric 


(i) Mutations to stop codons 






K48stop 







R126H E129stop(S131S) 






El29stop 






K141stop 






KISSstop 






K257stop 






Y439F K442stop 







(ii) Mutations in the active site 

R232W 

D234G 

H238N 

(V236V) H238Y 
H238Y 

(L259L) T261A 



+ + 



T261S 


E264V 







E264D 


Y265S 







Y265S 






+ + 


Y265F 








D331A 








T332P 


E333D 




- 


E333A 






+ + 


Cm) Mutations in the substrate-binding groove 






YJ5F 


D18V 




+ + 


Y15F 


DJ8A 






Y15F 


D18V D19V T21S 


L22F 


+ 


Y15S 


D18V D19V T21P 


L22F 


+ 


Y15S 


D18A D19V T21P 


L22F 


+ 


D18V 


D19V 






D18V 


D19A 






D18A 


D19V T21S 




+ + 


D18A 


D19V T21P L22F 




+ 


S51R 


S53R 




— 


D54V 


(V55V) Y57C 




— 


D54A 


(V55V) Y57S (G58G) 




- 


D54V 


(V55V) Y57S (G58G) 




— 


Y60F 


D61V Y63F 






Y63C 






+ 


K237R 






+ + 


P334S 






+ 4- 


Q336L 






+ 


Q336H 






+ + 


Q336P 






+ 


Y369S 






+ 


Y370S 






+ 


Civ) Mutations in or around the interface between domains A and C 






D343A 






+ 


K347N 






+ 


Y351S 






+ 


D54L 






+ 


R413G 








R413W 


E414A 






R413W 


E414V 






E414V 








E414D 








T417P 


E418A 




+ 


(A426A) 


I428F 






S435F 








S435C 


K436E 




+ 


fv) Mutations in 


other buried Sites fSQlvent-accessihte surface of wild-tvpe residue <50 A 2 } 






E13V 






+ + 



232: R 
234: D 
238: HG 



261: WIQTCSA 
264; E 
265: YVWI 

331: D 
332: TNM 
333: EQYP gap 



15: YERSK 
18: YDNQ gap 
19: DSC gap 
21: TQG gap 
22: LHWSTD gap 



51:SPEYVQIA 
53: SAVDNTGH 
54: DCTIQNS gap 
57: RKY gap 
60: YGW 
61: DRTHWE 
63: Y 

336: QTDSGA gap 



369: YMFEPR 
370: YFQEWR 



343: DQPNGS 
347: KVLIRY 
351: YTVQWN 
354: IFVAM 
413: RUCQ 
414: EDKRGS 



417: TS gap 
418: ES gap 
428: ILFA gap 
435: SAYLGW 
436: KDSQAFT 



D19V 

186 



T21S 



+ + 



Bacillus stearothermophilus a-amylase 



Table II (continued). 



Activity Observed amino acids 



D19V 


T21P (L22L) T24P 


+ + 




E29D 




+ 


29: EDKA 


N32I 


L33F S35C 


+ 


32: NYHDMC 


L33F 






33: LIGKTE 


L33F 


S35C 


+ 


35: SEDAGFIYQ 


K48T 




+ + 




E129G 




+ + 




Q254P 




+ 4- 




I277F 




+ + 




L291F 






291: LTIFS 


G33SC 






335: GFN gap 


G335C 


A337T 




337: ASQE 


A337T 




+ + 




(vi) Mutations in other exposed sites Isol vent-accessible surface of wjld-tvoe residue >50 A 2 







L22F 

K25N 

N31Y 

N127T 

T133S 

Y134S 

Yi34F 

Q135L 

Q137P 

A138T 

N149I 

H208L 

T213A 

K216E 

S217G 

S217C 

K220R 

T225P 

Y2S0F 

T255P 

K257T 

K257Q 

K257M 

K419N 

K419T 

Y439F 

K442N 

Q443H 



Y134F 



K442Q 



+ 
+ 

+ + 
+ + 

+ + 
+ + 
+ + 

+ + 

+ 
+ 

+ + 
+ 4- 

+ 

+ + 
+ + 



25: KRMG gap 

31: NAED 

127: NLQTWDF 



225: TEDNMI gap 
250: YHAVST 
255: TSA gap 



419: KVA gap 
439: YAGIVSQLE 
442: KRLSDNQA gap 



The wild-type amino acid is given on the left and the amino acid it is mu tated to is given on the right of the residue number. Deletorious substitutions are) 
printed in bold. Note that some of the mutants have multiple substitutions \ comparisons with corresponding single mutants sometimes suggest the causative 
change. In clusters (like the one around residue 20), where the casuative mutations are less clear, the primary suspects are printed in italics. Consequendy, al 
other mutations have been interpreted as being neutral. Silent mutations seen only at the DNA level are in parentheses. J Enzyme activities of the mutant clones 
were estimated on starch plates as compared to the, wild-type clone: + ■+- wild-type activity, + less than wild-type but greater than zero, — zero activity. No 



clone showed drastically elevated activity Jj Mutants are listed grouped according to their location in the structural model. The solvent-accessible surface area 
was calculated using the program DSSP (Kabsch and Sander, 1983) from the model of wild-type B. stearothermophilus a-amylase. Residues 1-104 and 
208-400 belong to domain A, residues 105 - 207 to domain B and residues 401-515 to domain C. For convenience of comparison to other a-amylase 
sequences, the right-most column lists for + and - mutants which amino acids are observed in natural a-amylases at given positions. The data for this listing 
also includes five mammalian sequences (PIR codes ALHUS, ALHUP, ALMSS, ALMSP, ALRTP) which are highly homologous to pig pancreatic a-amylase 
and which were excluded from Figure 1. 



domains A and C. The relationship between the effects of 
different mutations and their location on the model is discussed 
in more detail below. 

Active site 

There are many conserved and buried charged residues around, 
the active site. D331 is relatively accessible to solvent but the 
other two proposed catalytic residues, D234 and E264, are close 
neighbours and appear almost buried at the bottom of a pocket. 
The most striking mutation in this region is the chemically 



homologous substitution E264D, which is inactive (in a double 
mutant with Y265S, which as a single mutation has wild-type 
activity). This strongly suggests that E264 is essential for 
catalysis. In addition, the mutation D331A on one hand and the 
mutant D234G on the other are inactive, suggesting that the 
carboxyl groups of all three, D234, E264 and D331, play a role 
in the enzymatic mechanism. In the energy minimized model, 
R232 is in the centre of a network of hydrogen bonds and salt 
bridges, which may fix the catalytic residues in the correct 
orientation for catalysis. R232 is connected to D234, D101, H330 
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Fig. 3. Location of mutations in the structural model of B.stearothermophilus a-amylase. Residues which upon mutation produce (A) loss of activity 
(- mutants), (B) reduced activity (+ mutants) or (C) wild-type activity (+ + mutants). The side chain atoms are drawn as spheres with 0.5 times their van 
der Waals radius (atom colouring: C green, N blue, O red). Most - and -f mutants are clustered around the active site and the interface between domains A 
and C. 



and E264, and further to W42, D331, N329 and D288. All of 
these residues are highly conserved. In other a-amylases, W42 
can be replaced by Q, N329 by S and D288 by N or E or A 
but all the other mentioned residues are invariant (Figure 1). The 
mutants R232G and R232W are inactive, thus demonstrating the 
functional importance of this conserved arginine residue. 

Substrate-binding groove 

In the model there is a clearly visible groove winding across the 
top of the a/p barrel in domain A, into which a six-unit amylose 



chain was docked. Because D234 and E264 are at the bottom 
of a deep pocket, the long substrate chain could only be brought 
into contact with them by introducing a sharp bend in the chain. 
The glucose units make extensive contacts with a number of 
residues equivalent to those proposed for Taka-amylase A 
(Matsuura etal, 1984) and porcine pancreatic a-amylase 
(Buisson et ai t 1988). Those residues that are in contact with 
the substrate at the subsites adjacent to the hydrolyzed bond are 
highly conserved in all a-amylases, but more variation is allowed 
at the other subsites. 
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In Taka-amylase A the region of the substrate-binding groove 
furthest from the catalytic site is blocked by the adjacent 
protruding loops 17—38 and 333 — 350 (residue numbers 
according to the Taka-amylase A sequence). They correspond 
to the much shorter loops 15-19 and 370—374 in B. stearo- 
thermophilus a-amylase (Figure 2). We speculate that the 
'missing' subsites might explain the fact that Taka-amylase A 
is more saccharifying than Bacillus a-amylases (Priest, 1984) 
by biasing the substrate specificity of Taka-amylase A towards 
residues at the end of the amylose chain. 

In the model of B. stearothermophilus a-amylase there are 
notably many tyrosine side chains lining the substrate-binding 
groove. Several mutations, single or multiple, in these residues 
(Y15, Y57, Y60, Y63, Y369, Y370) lead to decreased activity 
(Table II). From its location and mutant data (Table II), Q336 
is also implied in substrate binding in B. stearothermophilus 
a-amylase, though it has no counterpart in Taka-amylase A, for 
example (Figure 1). W266 is in contact with the docked substrate. 
It corresponds to tryptophan 206 in the barley a-amylase, which 
has been pinpointed as essential for enzymic function in 
biochemical mapping (Gibson and Svensson, 1987). Interestingly, 
in some a-amylases this position is occupied by hydrophobic 
residues other than tryptophan. . 

Interface between domains A and C 

A second major cluster of inactivating mutations is found in and 
around the interface between domains A and C, —30 A away 
from the catalytic site (Table II and Figure 3). In low solution 
X-ray studies of porcine pancreatic a-amylase a second binding 
site for substrate analogues has been identified near the 
N-tenninus of the eighth helix in domain A that is facing domain 
C (Pay an et al, 1980). The function of this site is unknown, 
but it has been implied that it may play a role either in anchoring 
the enzyme to starch or in the regulation of activity (Buisson 
et ai, 1981). In our model both salt bridges and hydrophobic 
contacts are found across the interface between domains A and 
C. The latter interactions are affected in the inactive mutant 
I428F, which is located on a strand in domain C, and in the 
mutant I354L, which is on the seventh a helix in domain A and 
has much decreased activity although this amino acid substitution 
is conservative. The dramatic effects of such ' innocent-looking ' 
mutations seem to imply that the exact orientation of domain C 
to domain A is crucial for activity. In the model of B. stearo- 
thermophilus a-amylase these two residues are opposite to each 
other. Many combinations of hydrophobic residues are found at 
these positions in the other a-amylases (Table II, Figure 1). 
Since, particularly in domain C, many gaps were introduced into 
the alignment (Figure 1), the structural details of domain C and 
its contacts with domain A are obviously different in the different 
enzymes. This is clearly illustrated by comparing the known 
structures of Taka-amylase A and porcine pancreatic a-amylase 
(Buisson et al. , 1988). That domain C is important for activity 
is also shown by the effect of a mutation that causes a premature 
stop codon at residue 442. This removes roughly two-thirds of 
domain C and leads to inactivation of the enzyme. 

Calcium-binding site 

The structure of the calcium binding site between domains A and 
B is similar in the two known structures, and the invariant D203 
is one of the ligands of the calcium ion (Buisson et al, , 1988). 
D105 and the main-chain carbonyl oxygen from the invariant 
H238 are other ligands. D105 is replaced by an asparagine in 
the other a-amylases (Figure 1). The importance of H-bonds 
involving the sidechain of H238 is suggested by the mutations 



H238N, which is still functional, and H238Y, which is inactive 
(Table H). 

Buried sites are typically more sensitive to mutation than 
exposed sites 

The great majority of amino acid replacements which do not much 
affect the activity of the enzyme are located at exposed sites in 
the 3-D model. Mutations affecting the active site region and the 
A — C interdomain region which inactivate a-amylase were 
described above. Several other examples, where even conserva- 
tive mutations are deleterious, were found at buried sites. For 
example, the inactive mutant L33F and the mutant L291F, which 
has reduced activity, are located in the interior of domain A. 
Interestingly, the substitution corresponding to L291F is found 
in B. circulans a-amylase. An examination of the structure of 
Taka-amylase A and models of B. circulans and B. stearothermo- 
philus a-amylase indicated that Phe can be accommodated at this 
position in the core of B. circulans a-amylase due to a number 
of compensating substitutions to smaller residues in its 
neighbourhood, but it could not be fitted to the other two 
structures without steric clashes. 

E29D, located at a partially buried site, has reduced activity. 
In the three-dimensional model E29 forms a salt bridge to K25. 
This seems to be an important interaction in the B. stearothermo- 
philus enzyme, since the mutant K25N is inactive. The other 
liquefying Bacillus enzymes have D at position 29, but the 
reduction in size of this residue is compensated by R at position 
25. No corresponding salt bridge is present in the structure of 
Taka-amylase A. 

At certain sites proline may be accommodated without effects 
on the activity, and at other sites it cannot be accommodated 
(Table H). For example, the mutations Q254P, leading to fully 
active enzyme, and T255P, leading to inactive enzyme, are in 
neighbouring residues in a surface loop. 

Interactive effects in multiple mutations 

Table II shows that generally the effects of amino acid substi- 
tutions are additive. There are, however, a number of exceptions. 
For example, G335C alone gives an inactive enzyme, but activity 
is recovered if A337T is also present. A337T alone has no effect 
on activity. Other interesting examples include the clusters of 
mutations around residues D18 and L33. 



Discussion 

We describe here the construction of a molecular model of 
B. stearothermophilus a-amylase using a novel alignment of 
amino acid sequences from many distantly related a-amylases 
and the available information on secondary /tertiary structures. 
We also present the phenotype and spatial location of 98 separate 
randomly generated amino acid mutants and note that inactivating 
mutations are clustered in the model, revealing functionally 
important regions. Thus, the model provides a rationale for 
interpretation of the mutations, and vice versa, data on the mutants 
provides evidence supporting the model. 

Porcine pancreatic a-amylase and Taka-amylase A have only 
23 % identical amino acids (Table I) but very similar three- 
dimensional structures. We have assumed that all a-amylases, 
which have typically —20% identical residues, share this 
common structural design. There are several well known 
examples of protein families, such as the globins and the trypsin- 
like serine proteases, where die sequences can diverge even more 
but the three-dimensional structures are well conserved. The 
correct alignment of the amino acid sequences to identify residues 
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occupying topologicaUy equivalent positions is critical to the 
success of homology modelling (Read et al , 1984). Argos (1987) 
has described a cautionary example concerning the pairwise 
alignment of lactate and malate dehydrogenase: the sequence 
alignment optimal in terms of amino acid identity does not 
coincide with the alignment derived from the optimal super- 
position of the tertiary structures. However, the alignment of 
a-amylases described here is supported by structural information 
concerning calcium ligands and secondary structure. In addition, 
ainino acids of similar physical properties are well aligned. At 
least 70% of residues in all sequence pairs are aligned in Figure 1 
and, statistically, the alignment is significant (Table I). It is 
generally accepted that if the sequence of a protein with unknown 
tertiary structure can be related to one with known architecture, 
then the main-chain folding of the latter provides an adequate 
base for structural modelling (Blundell et al , 1987). However, 
in the course of evolution, mutations are accommodated in protein 
structures through small shifts in the relative position of secondary 
structure elements (Chothia and Lesk, 1982a). Such effects cannot 
be accounted for in modelling procedures which use a rigid 
framework. Refinement by energy minimi rati nn results only in 
small^ shifts in relative atomic positions, typically in the order 
of 1 A at most. Thus, it is clear that this model represents only 
one possible low-energy conformation of the protein. Despite 
these reservations, we were pleased to note that the a-amylase 
model is consistent with biochemical data obtained from random 
mutagenesis. 

For this work, it was more important to gather information 
on a large number of mutants than to analyze a few in detail. 
Mutants generated by random mutagenesis were identified by 
sequencing, since it was important to understand what types of 
base changes the misincorporation procedure produced 
(Lehtovaara etal, 1988). Around 40% of phenotypic mutants 
in the sample described here retained wild-type activity. The 
remainder had reduced activity or were inactive. 

The loss of enzyme activity observed in the plate assay could 
be assayed by: (i) impaired function of secreted enzyme or (ii) a 
reduction in the amount of enzyme due to destabilization of the 
secreted protein or impaired folding and even degradation inside 
E. coli. Mutational analysis of a number of proteins has shown 
that amino acids which are critical to protein stability are generally 
relatively rigid and inaccessible to solvent in the folded protein 
whilst substitutions at mobile and exposed sites usually have little 
effect on protein stability (reviewed by Alber, 1989). Exceptions 
are made by the introduction of prolines or replacement of 
conformatibnally special glycines, which can propagate shifts in 
the folded structure even if the mutation is at an exposed site 
(Alber, 1989). Our interpretations concerning the effects of the 
mutations are based on the location and interactions of the mutated 
residues in the structural model. Mutations which apparently 
impair function are clustered together in three-dimensional space 
at two surface sites: the active site cleft and the domain 
A -domain C interface. In the substrate-binding groove, a 
number of exposed residues, which may form hydrogen bonds 
to the substrate, were shown to be sensitive to mutation. The 
catalytic center and calcium-binding site are also identified from 
the known X-ray structures and by sequence conservation. Seven 
absolutely conserved charged residues in the family of a-amylases 
are located here. Four of these residues were mutated in our 
sample and led to inactive enzyme in each case. Apart from the 
clusters of deleterious mutations mentioned above, exposed 
residues were found to be less sensitive to mutations than buried 
ones. A number of mutations to proline and mutations substituting 



a larger sidechain at buried sites were apparently deleterious 
because they disrupt the tertiary structure. 

The pH dependence of a-amylase suggests a lysozyme-type 
mechanism employing two carboxylic acids, one ionized and die 
other protonated in the active form. The adjacent E264 and D234 
are buried, and therefore may have elevated p7sT a s, whereas 
D331 is more exposed and is expected to have a normal pisT a . 
Matsuura et al (1984) have argued that in Taka-amylase A two 
carboxyl groups in different environments are required for activity 
and therefore the residue corresponding by homology to D331 
in B.stearothermophilus a-amylase is a catalytic residue. The 
proposals by Matsuura et al (1984) and Buisson et al (1988) 
agree on the role of D331 as a catalytic residue in a-amylases 
but differ with respect to the roles of the neighbouring residues 
E264 and D234. Buisson et al (1988) reason that D234, which 
has more direct interactions with the calcium-binding site, is likely 
to be the other catalytic residue. However, our mutant data very 
strongly suggest that E264 is a catalytic residue. 

Domain C is shown here to be indispensable for activity since 
a cluster of mutations far from the active site reduce enzyme 
activity. Subtle conformational changes at the interface between 
domains A and C, which we assume accompany the mutations 
I428F and I354L, seem to have a large effect on activity. It has 
been proposed for mammalian a-amylases that a site at the 
domain A— domain C interface might have a regulatory role in 
that substrate analogues bind to this region and affect activity 
(Payan et al, 1980; Buisson et al, 1981). It is not clear why 
a bacterial exoenzyme should require this type of regulation. We 
therefore propose that domain C plays an important role in starch 
hydrolysis by orientating the active site cleft of domain A 
correctly with respect to the amylose chain. 

In conclusion, we suggest that this approach of integrating 
computer-aided molecular modelling to the rationalization of 
random mutagenesis data can contribute substantially to an 
understanding of protein structure and function, even in cases 
like this where the crystal structure of that protein is not available. 
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